GENERA.TEX[W86,JMC]1 - www.SailDart.org

perm filename GENERA.TEX[W86,JMC]1 blob sn#841439 filedate 1987-06-12 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00005 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	%genera[w86,jmc]		Generality in AI, for old Turing lecture
C00045 00003	% references
C00053 00004	%GENERA[W86,JMC] TEXed on \jmcdate\ at \theTime.
C00054 00005
C00059 ENDMK
C⊗;
%genera[w86,jmc]		Generality in AI, for old Turing lecture
\input memo.tex[let,jmc]
\overfullrule=0pt
\centerline{\bf GENERALITY IN ARTIFICIAL INTELLLIGENCE}
\smallskip
\centerline{by John McCarthy, Stanford University}
\medskip

	My 1971 Turing Award Lecture was entitled ``Generality in
Artificial Intelligence''.  The topic turned out to have been
overambitious in that I discovered that I was unable to put my thoughts on
the subject in a satisfactory written form at that time.  It would have
been better to have reviewed my previous work rather than attempt something
new, but such wasn't my custom at that time.

	I am grateful to the ACM for the opportunity to try again.
Unfortunately for our science, although perhaps fortunately for this
project, the problem of generality in AI is almost as unsolved as ever,
although we now have many ideas not available in 1971.  This paper
relies heavily on such ideas, but it is far from a full 1986 survey
of approaches for achieving generality.  Ideas are discussed
at a length
proportional to my familiarity with them rather than according to
some objective criterion.

	It was obvious in 1971 and even in 1958 that AI programs
suffered from a lack of generality.  It is still obvious, and now there
are many more details.  The first gross symptom is that a small addition to the
idea of a program often involves a complete rewrite beginning with the
data structures.  Some progress has been made in modularizing data
structures, but small modifications of the search strategies are even less
likely to be accomplished without rewriting.

	Another symptom is that no-one knows how to make a general database
of common sense knowledge that could be used by any program that
needed the knowledge.  Along with other information, such a
database would contain what a robot would need to know about the
effects of moving objects around, what a person can be expected to
know about his family, and the facts about buying and selling.
This doesn't depend on whether the knowledge is to be expressed
in a logical language or in some other formalism.
When we take the logic approach to AI, lack of generality shows up in that
the axioms we devise to express common sense knowledge are too restricted
in their applicability for a general common sense database.
In my opinion, getting a language for expressing general common
sense knowledge for inclusion in a general database is the key
problem of generality in AI.

	Here are some ideas for achieving generality proposed both
before and after 1971.  I repeat my disclaimer of comprehensiveness.
%
\medskip
\noindent{\bf Representing behavior by program:}

	Friedberg [1958 and 1959] discussed a completely general way of
representing behavior and  provided a way of learning to improve it.  
Namely, the behavior is represented by a computer program and
learning is accomplished by making random modifications to the
program and testing the modified program.  The Friedberg approach
was successful in learning only how to move a single bit from
one memory cell to another, and its scheme of rewarding instructions
involved in successful runs by reducing their probability of modification
was shown by Simon [1960] to be inferior to testing each program thoroughly
and completely scrapping any program that wasn't perfect.
No-one seems to have attempted
to follow up the idea of learning by modifying whole programs.

	The defect of the Friedberg approach is that while representing
behaviors by programs is entirely general, modifying behaviors by
small modifications to the programs is very special.  A small
conceptual modification to a behavior is usually not represented
by a small modification to the program, especially if machine
language programs are used and any one small modification to the
text of a program is considered as likely as any other.

	It might be worth trying
something more analogous to genetic
evolution; duplicates of subroutines would be made, some
copies would be modified and others left unchanged.  The learning
system would then experiment whether it  was advantageous to change
certain calls of the original subroutine to calls of the modified
subroutine.  Most likely even this wouldn't work unless the
relevant small modifications of behavior were obtainable by
calls to slightly modified subroutines.  It would probably be
necessary to provide for modifications to the number of arguments
of subroutines.

	While Friedberg's problem was learning from experience, all
schemes for representing knowledge by program suffer from similar
difficulties when the object is to combine disparate knowledge or to make
programs that modify knowledge.
\medskip
\noindent {\bf The General Problem Solver (GPS) and its Successors:}

	One kind of generality in AI comprises
methods for finding solutions that are independent
of the problem domain.  Allen Newell, Herbert Simon and their colleagues
and students pioneered this approach and continue to pursue it.

	Newell and Simon first proposed GPS in their [1957].  The
initial idea was to represent problems of some general class as problems
of transforming one expression into another by means of a set
of allowed rules.  It was even suggested in their [1960]
 that improving GPS could
be thought of as a problem of this kind.  In my opinion, GPS was unsuccessful
as a general problem solver, because problems don't take this
form in general and because most of the knowledge about the common
needed for problem solving and achieving goals is not simply
representable in the form of rules for transforming expressions.
However, GPS was the first system to separate the problem solving
structure of goals and subgoals from the particular domain.

	If GPS had worked out to be really general, perhaps the Newell and
Simon predictions about rapid success for AI would have been realized.
Newell's current [1986] candidate for general
problem representation is SOAR, which, as I understand it, is concerned
with transforming one state to another, where the states need not
be represented by expressions.
\medskip
\noindent {\bf Production Systems:}

	The first production systems were done by Newell and Simon
in the 1950s, and the idea was written up in their [1972].
A kind of generality is achieved by using the same goal seeking mechanism
for all kinds of problems, changing only the particular productions.
The early production systems have grown into the current proliferation
of expert system shells.

	Production systems represent knowledge in the form of facts
and rules, and there is almost always a sharp syntactic distinction
between the rules.  The facts usually correspond to ground instances
of logical formulas, i.e. they correspond to predicate symbols applied
to constant expressions.  Unlike logic-based systems, these facts
contain no variables or quantifiers.  New facts are produced by
inference, observation and user input.  Variables are reserved for
rules, which usually take a pattern-action form.  Rules are put
in the system by the programmer or ``knowledge engineer'' and in
most systems cannot arise via the action of the system.  In exchange
for accepting these limitations, the production system programmer
gets a relatively fast program.

	Production system programs rarely use fundamental knowledge
of the domain.  For example, MYCIN [Buchanan and Shortliffe 1974]
has many rules about how to infer which bacterium is causing an
illness based on symptoms and the results of laboratory tests.  However,
its formalism has no way of expressing the fact that bacteria are
organisms that grow within the body.  In fact MYCIN has no way of
representing processes occuring in time, although other production
systems can represent processes at about the level of the situation
calculus to be described in the next section.

	The result of a production system pattern match is a substitution
of constants for variables in the pattern part of the rule.  Consequently
production systems do not infer general propositions.  For example,
consider the definition that a container is sterile if it is
sealed against entry by bacteria, and all the bacteria in it are dead.
A production system (or a logic program) can only use this fact
by substituting particular bacteria for the variables.  Thus it
cannot reason that heating a sealed container will sterilize it
given that a heated bacterium dies, because it cannot reason
about the unenumerated set of bacteria in the container.  These
matters are discussed further in [McCarthy 1983].
%
\medskip
\vfil\eject
\noindent {\bf Representing knowledge in logic:}

	It seemed to me in 1958 that small modifications in behavior are
most often representable as small modifications in beliefs about
the world, and this requires a system that represents beliefs
explicitly.

	[McCarthy 1960] said, {\it ``If one wants a machine to be able to
discover an abstraction, it seems most likely that the machine must be
able to represent this abstraction in some relatively simple way''}.

	The 1960 idea for increasing generality was to
use logic to express facts in a way independent of the way the facts
might subsequently be used.  It seemed then and still seems that
humans communicate mainly in declarative sentences rather than in programming
languages for good objective reasons that will apply whether the
communicator is a human, a creature from Alpha Centauri or a computer
program.  Moreover, the advantages of declarative information also
apply to internal representation.  The advantage of declarative
information is one of generality.  The fact that when two objects
collide they make a noise may be used in particular situations to make
a noise, to avoid making noise, to explain a noise or to explain the
absence of noise.  (I guess those cars didn't collide, because while
I heard the squeal of brakes, I didn't hear a crash).

	Once one has decided to build an AI system that represents
information declaratively, one still has to decide what kind of
declarative language to allow.  The simplest systems allow only
constant predicates applied to constant symbols, e.g.
$on(Block1,Block2)$.  Stepping up in generality,
 one can allow arbitrary constant terms,
built from function symbols constants and predicate symbols, e.g.
$location(Block1) = top(Block2)$.  Prolog databases
allow arbitrary Horn clauses that include free variables, e.g.\hfil\break
$P(x,y) ∧ Q(y,z) ⊃ R(x,z)$, expressing the Prolog in standard
logical notation.  Beyond that lies full first order logic
including both existential and universal quantifiers and
arbitrary first order formulas.  Within first order logic,
the expressive power of a theory depends on what domains the
variables are allowed to range.  Important expressive power
comes from using set theory which contains expressions for
sets of any objects in the theory.

	Every increase in expressive power carries a price in
the required complexity of the reasoning and problem solving
programs.  To put it another way, accepting limitations on
the expressiveness of one's declarative information allows
simplification of the search procedures.  Prolog represents
a local optimum in this continuum, because Horn clauses
are medium expressive but can be interpreted directly
by a logical problem solver.

	One major limitation that is usually accepted is to
limit the derivation of new facts to formulas without variables,
i.e to substitute constants for variables and then do propositional
reasoning.  It appears that most human daily activity involves
only such reasoning.  In principle, Prolog goes slightly beyond
this, because the expressions found as values of variables
by Prolog programs can themselves involve free variables.
However, this facility is rarely used except for intermediate
results.

	What can't be done without more of predicate calculus than
Prolog allows is universal generalization.  Consider the rationale
of canning.  We say that a container is sterile if it is sealed
and all the bacteria in it are dead.  This can be expressed
as a fragment of a Prolog program as follows.
%
$$\displaylines{
\qquad sterile(X)\ \rm{\hbox{:-}}\ sealed(X), not alive\hbox{-}bacterium(Y,X).\hfill\cr
%
\qquad alive\hbox{-}bacterium(Y,X)\ \rm{\hbox{:-}}\ 
 in(Y,X),bacterium(Y),alive(Y).\hfill\cr}$$
%

	However, a Prolog program incorporating this fragment
directly can sterilize a container only by killing each
bacterium individually and would require that some other
part of the program successively generate the names of the bacteria.
It cannot be used to discover or rationalize canning --- sealing
the container and then heating it to kill all the bacteria at once.
The reasoning rationalizing canning involves the use of quantifiers
in an essential way.

	My own opinion is that reasoning and problem solving
programs will eventually have to allow the full  use of quantifiers
and sets and have strong enough control methods to use them
without combinatorial explosion.

	While the 1958 idea was well received, very few attempts
were made to embody it in programs in the immediately following years,
the main one being F.~Black's Harvard PhD thesis of
1964.  I spent most of my time on what I regarded as preliminary
projects, mainly LISP.  My main reason for not attempting an
implementation was that I wanted to learn how to express common
sense knowledge in logic first.  This is still my goal.  I might
be discouraged from continuing to pursue it if people pursuing nonlogical
approaches were having significant success in achieving
generality.

	[McCarthy and Hayes 1969] made the distinction between
epistemological and heuristic aspects of the AI problem and asserted
that generality is more easily studied epistemologically.  The distinction
is that the epistemology is completed when the facts available have
as a consequence that a certain strategy is appropriate to achieve
the goal, while the heuristic problem involves the search that
finds the appropriate strategy.

	Implicit in [McCarthy 1960] was the idea of a general purpose
common sense database.  The common sense information possessed by humans
would be written as logical sentences and included in the database.  Any
goal-seeking program could consult the database for the facts needed
to decide how to achieve its goal.  Especially prominent in the database
would be facts about the effects of actions.  The much studied example
is the set of facts about the effects of a robot trying to move objects
from one location to another.
  This led in the 1960s to
the {\it situation calculus}
 [McCarthy and Hayes 1969]
 which was intended to provide a way of
expressing the consequences of actions independent of the problem.

	The basic formalism of the situation calculus is
%
$$s' = result(e,s),$$
%
which asserts that $s'$ is the situation that results when event $e$
occurs in situation $s$.  Here are some situation calculus axioms for
moving and painting blocks.
%
\bigskip
\bigskip
%{\overfullrule=0pt
\centerline{Qualified result-of-action axioms}
%$$\displaylines{∀x l s.clear(top(x),s) ∧ clear(l,s) ∧ ¬tooheavy(x)\hfill\cr
%\hfill ⊃ loc(x,result(move(x,l),s)) = l\cr}$$
$$∀x l s.clear(top(x),s) ∧ clear(l,s) ∧ ¬tooheavy(x)
⊃ loc(x,result(move(x,l),s)) = l$$
\noindent asserting that moving an object $x$ to a location $l$ results
in a new situation $result(move(x,l),s)$ in which $x$ is at location $l$
provided $x$ is not too heavy and both its top and the location $l$ are
clear.
$$\displaylines{∀x c s.color(x,result(paint(x,c),s)) = c.\hfill\cr}$$
\noindent asserting that the result of painting an object $x$ with
paint of color $c$ results in its having that color.
\medskip
\centerline{Frame axioms}
	The next four axioms assert that moving an object doesn't change
the color of any object or the location of any other object and that
painting an object doesn't change the locations of any object or the
color of other objects than the one painted.
$$∀x y l s.color(y,result(move(x,l),s)) = color(y,s).$$
$$∀x y l s.y ≠ x ⊃ loc(y,result(move(x,l),s)) = loc(y,s).$$
$$∀x y c s.loc(x,result(paint(y,c),s)) = loc(x,s).$$
$$∀x y c s.y ≠ x ⊃ color(x,result(paint(y,c),s)) = color(x,s).$$
\medskip
	Notice that all qualifications to the
performance of the actions are explicit in the premisses and that
statements (called frame axioms)
about what doesn't change when an action is performed
are explicitly included.  Without those statements it wouldn't be
possible to infer much about $result(e2,result(e1,s))$, since we
wouldn't know whether the premisses for the event $e2$ to have its
expected result were fulfilled in $result(e1,s)$.

	Notice further that the situation calculus applies only when it
is reasonable to reason about discrete events, each of which results
in a new total situation.  Continuous events and concurrent events
are not covered.

	Unfortunately, it wasn't very feasible to use the situation calculus 
in the manner proposed, even for problems meeting its restrictions.
  In the first place, using general
purpose theorem provers made the programs run too slowly, since
the theorem provers of 1969 [Green 1969] had no way of controlling
the search.  This led to STRIPS [Fikes and Nilsson 1971] which
reduced the use of logic to reasoning within a situation.  Unfortunately,
the STRIPS formalizations were much more special than full situation calculus.
The facts that were included in the axioms had to be delicately chosen
in order to avoid the introduction of contradictions arising from
the failure to delete a sentence that wouldn't be true in the
situation that resulted from an action.
%
\medskip
\noindent {\bf Non-monotonicity:}

	The second problem with the situation calculus axioms is that
they were again not general enough.  This was the {\it qualification problem},
and a possible way around it wasn't discovered until the late 1970s.
Consider putting an axiom in a {\it common sense database} asserting that
birds can fly.  Clearly the axiom must be {\it qualified} in some way since
penguins, dead birds and birds whose feet are encased in concrete
can't fly.  A careful construction of the axiom might succeed in
including the exceptions of penguins and dead birds, but clearly
we can think up as many additional exceptions like birds with
their feet encased in concrete as we like.  Formalized non-monotonic
reasoning (see [McCarthy 1980, 1986], [Doyle 1977], [McDermott and
Doyle 1980] and [Reiter 1980]) provides a formal way of saying that
a bird can fly unless there is an abnormal circumstance and reasoning
that only the abnormal circumstances whose existence follows from the
facts being taken into account will be considered.

	Non-monotonicity has considerably increased the possibility of
expressing general knowledge about the effects of events in the situation
calculus.  It has also provided a way of solving the {\it frame problem},
which constituted another obstacle to generality that was already
noted in [McCarthy and Hayes 1969].  The frame problem (The term has
been variously used, but I had it first.) occurs when there are
several actions available each of which changes certain features
of the situation.  Somehow it is necesary to say that an action
changes only the features of the situation to which it directly
refers.  When there is a fixed set of actions and features, it can
be explicitly stated which features are unchanged by an action, even
though it may take a lot of axioms.  However, if we imagine that
additional features of situations and additional actions may be
added to the database, we face the problem that the axiomatization
of an action is never completed.  [McCarthy 1986] indicates how
to handle this using {\it circumscription}, but Lifschitz [1985]
has shown that circumscription needs to be improved and has made
proposals for this.

	Here are some situation calculus axioms for moving and
painting blocks taken from [McCarthy 1986].

\bigskip
{\centerline{\bf Moving and painting blocks using circumscription}
\overfullrule=0pt
\bigskip
\centerline{Axioms about locations and the effects of moving objects}
$$∀x e s.¬ab(aspect1(x,e,s)) ⊃ loc(x,result(e,s)) = loc(x,s)$$
\noindent asserting that objects normally don't change their locations.
More specifically the assertion is that an object doesn't change its
location unless the triple consisting of the object, the event that
occurs and the situation in which it occurs are abnormal in aspect1.
$$∀x l s.ab(aspect1(x,move(x,l),s))$$
\noindent However, moving an object to a location in a situation
is abnormal in aspect1.
$$∀x l s.¬ab(aspect3(x,l,s)) ⊃ loc(x,result(move(x,l),s)) = l$$
\noindent Unless the relevant triple is abnormal in aspect3, the
action of moving an object to a location $l$ results in its being
at $l$.
\bigskip
\centerline{Axioms about colors and painting}
$$∀x e s.¬ab(aspect2(x,e,s)) ⊃ color(x,result(e,s)) = color(x,s)$$
$$∀x c s.ab(aspect2(x,paint(x,c),s))$$
$$∀x c s.¬ab(aspect4(x,c,s)) ⊃ color(x,result(paint(x,c),s)) = c$$}
\noindent These three actions give the corresponding facts about what changes
the color of an object.
\medskip
This treats the qualification problem, because any number
of conditions that may be imagined as preventing moving or painting
can be added later and asserted to imply the corresponding $ab\ aspect\ldots$.
It treats the frame problem in that we don't have to say
that moving doesn't affect colors and painting locations.

	Even with formalized non-monotonic reasoning, the general
commonsense data\-base still seems elusive.  The problem is writing
axioms that satisfy our notions of incorporating the general facts about
a phenomenon.  Whenever we tentatively decide on some axioms, we
are able to think of situations in which they don't apply and
a generalization is called for.  Moreover, the difficulties that
are thought of are often {\it ad hoc} like that of the bird with
its feet encased in concrete.
%
%\medskip
\vfil\eject
\noindent {\bf Reification:}

	Reasoning about knowledge, belief or goals requires
extensions of the domain of objects reasoned about.
For example, a program that does backward chaining on goals
uses them directly as sentences, e.g. $on(Block1,Block2)$,
i.e. the symbol $on$ is used as a predicate constant of the language.
However, a program that wants to say
directly that $on(Block1,Block2)$ should be postponed until
$on(Block2,Block3)$ has been achieved, needs a sentence
like $precedes(on(Block2,Block3),\hfil\break
on(Block1,Block2))$.
and if this is to be a sentence of first-order logic,
then the symbol $on$ must be taken as a function symbol,
and $on(Block1,Block2)$ regarded as an object in the
first order language.

	This process of making objects out of sentences
and other entities is called {\it reification}.  It is
necessary for expressive power but again leads to complications
in reasoning.  It is discussed in [McCarthy 1979].
%
\medskip
\noindent {\bf Formalizing the notion of context:}

	Whenever we write an axiom, a critic can say that the axiom
is true only in a certain context.  With a little ingenuity the
critic can usually devise a more general contex in which the precise
form of the axiom doesn't hold.
  Looking at human reasoning
as reflected in language emphasizes this point.
Consider axiomatizing ``on'' so as to draw appropriate consequences
from the information expressed in the sentence,
``The book is on the table''.  The critic
may propose to haggle about the precise meaning of ``on'' inventing
difficulties about what can be between the book and the table or
about how much gravity there has to be in a spacecraft in order to
use the word ``on'' and whether centrifugal force counts.
Thus we encounter Socratic
puzzles over what the concepts mean in complete generality
and encounter examples that never arise in life.  There simply
isn't a most general context.

	Conversely, if we axiomatize at
a fairly high level of generality, the axioms are often longer than
is convenient in special situations.
Thus humans find it useful to say, ``The book is on the table'' omitting reference
to time and
precise identifications of what book and what table.
This problem of how general to be arises whether the general common
sense knowledge is expressed in logic, in program or in some other formalism.
(Some people propose that the knowledge is internally expressed in the
form of examples only, but strong mechanisms using analogy and similarity
permit their more general use.  I wish them good fortune in formulating
precise proposals about what these mechansims are).

	A possible way out involves formalizing the notion of context
and combining it with the circumscription method of non-monotonic
reasoning.  We add a context parameter to the functions and predicates
in our axioms.  Each axiom makes its assertion about a certain context.
Further axioms tell us that facts are inherited by more restricted
context unless exceptions are asserted.  Each assertions is also
non-monotonically assumed to apply in any particular more general
context, but there again are exceptions.  For example, the rules
about birds flying implicitly assume that there is an atmosphere
to fly in.  In a more general context this might not be assumed.
It remains to determine how inheritance to more general contexts
differs from inheritance to more specific contexts.

	Suppose that whenever
a sentence $p$ is present in the memory of a computer, we consider it
as in a particular context and as an abbreviation for the sentence
$holds(p,C)$ where $C$ is the name of a context.  Some contexts are
very specific, so that Watson is a doctor in the context of Sherlock
Holmes stories and a baritone psychologist in a tragic opera about the
history of psychology.

	There is a relation $c1 ≤ c2$ meaning that context $c2$ is
more general than context $c1$.  We allow sentences like
$holds(c1 ≤ c2,c0)$ so that even statements relating contexts can
have contexts.  The theory would not provide for any ``most general
context'' any more than set theory provides for a
most general set.

	A logical system using contexts might provide operations
of {\it entering} and {\it leaving} a context yielding
 what we might call {\it ultra-natural
deduction} allowing a sequence of reasoning like
%
$$
\vbox{
\displaylines{holds(p,C)\hfill\cr
ENTER\ C\hfill\cr
p\hfill\cr
\qquad \vdots\hfill\cr
q\hfill\cr
LEAVE\ C\hfill\cr
holds(q,C).\hfill\cr}}$$
%
This resembles the usual logical natural deduction systems, but for
reasons beyond the scope of this lecture, it is probably not correct
to regard contexts as equivalent to sets of assumptions --- not
even infinite sets of assumptions.

	All this is unpleasantly vague, but it's a lot more
than could be said in 1971.

% references
% msg.msg[1,jmc]/393p Newell message
% s86.in[let,jmc]/393p Newell message
\medskip
\noindent {\bf References:}

{\bf Black, F., (1964)}: {\it A Deductive Question Answering System}, Doctoral
Dissertation, Harvard University.

{\bf Buchanan, B.G. and Shortliffe, E.H, eds., (1984)}:
Rule-based expert systems: the MYCIN experiments of the Stanford
Heuristic Programming Project.
%{\bf Shortliffe, Edward H. (1976)}:
%Computer-Based Medical Consultations: MYCIN, American Elsevier, New York, NY.

{\bf Davis, Randall; Buchanan, Bruce; and Shortliffe, Edward (1977)}:
``Production Rules as a Representation for a Knowledge-Based Consultation
Program,'' {\it Artificial Intelligence}, Volume 8, Number 1, February.

{\bf Doyle, J. (1977)}: ``Truth Maintenance Systems for  Problem  Solving,''
Proc. 5th IJCAI, p. 247.

{\bf Ernst, George W. and Allen Newell (1969)}: {\it GPS: A Case Study
in Generality and Problem Solving}, Academic Press.

{\bf Fikes, R, and Nils Nilsson, (1971)}:
``STRIPS: A New Approach to the Application of 
Theorem Proving to Problem Solving,'' {\it Artificial Intelligence}, Volume 2,
Numbers 3,4, January,
pp. 189-208.

{\bf Friedberg, R.M. (1958)}: ``A Learning Machine,''
{\it IBM Journal of Research}, Volume 2, Number 1, January, pp. 2-13.    

{\bf Friedberg, R.M., B. Dunham and J.H. North (1959)}: ``A Learning Machine, 
Part II,'' {\it IBM Journal of Research}, Volume 3, Number 3, July, pp. 282-287.

{\bf Green, C.(1969)}: ``Theorem-proving  by  Resolution  as  a  Basis  for
Question Answering  Systems'', in {\it Machine  Intelligence  4}, pp.
183-205  (eds.  Meltzer,  B.  and  Michie,  D.).   Edinburgh:
Edinburgh University Press.

{\bf Laird, John E., Allen Newell and Paul S. Rosenbloom (1986)}:
``Soar: An Architecture for General Intelligence'', to be published.

{\bf Lifschitz, Vladimir (1985)}: ``Computing Circumscription,''
in Proceedings of the 9th International Joint Conference on Artificial
Intelligence, Volume 1, 1985, pp. 121-127.

{\bf McCarthy, John (1960)}: ``Programs with Common Sense,'' in Proceedings of the
Teddington Conference on the Mechanization of Thought Processes, Her Majesty's
Stationery Office, London.  Reprinted in M. Minsky (ed.), 
{\it Semantic Information Processing}, M.I.T. Press, Cambridge, Mass.

{\bf McCarthy, John and P.J. Hayes (1969)}:  ``Some Philosophical Problems from
the Standpoint of Artificial Intelligence'', in D. Michie (ed), {\it Machine
Intelligence 4}, American Elsevier, New York, NY.

{\bf McCarthy, John (1979)}: 
``First Order Theories of Individual Concepts and Propositions'', 
in Michie, Donald (ed.) {\it Machine Intelligence 9}, (University of
Edinburgh Press, Edinburgh).

{\bf McCarthy, John (1983)}: ``Some Expert Systems Need Common Sense'',
in {\it Computer Culture: The Scientific, Intellectual and Social Impact
of the Computer}, Heinz Pagels, ed.
 vol. 426, Annals of the New York Academy of Sciences.
%paper
%presented at New York Academy of Sciences Symposium.
%  common[e83,jmc]

{\bf McCarthy, John (1980)}: 
``Circumscription - A Form of Non-Monotonic Reasoning'', {\it Artificial
Intelligence}, Volume 13, Numbers 1,2, April.
%  .<<aim 334, circum.new[s79,jmc]>>

{\bf McCarthy, John (1986)}:
``Applications of Circumscription to Formalizing Common Sense Knowledge''
{\it Artificial Intelligence}, April 1986
%  circum.tex[f83,jmc]

{\bf McDermott, D. and Doyle, J. (1980)}: ``Non-monotonic Logic I.''
{\it Artificial Intelligence}, Volume 13, Numbers 1,2, pp. 41-72.

{\bf Newell, A., J. C. Shaw and H. A. Simon (1957)}:
 ``Preliminary Description of General
Problem Solving Program-I (GPS-I)'', CIP Working Paper \#7, Carnegie Institute
of Technology, Dec.

%{\bf Newell, A., J. C. Shaw and H. A. Simon (1959)}:
%``Report on a general problem-solving
%program for a computer'', {\it Information Processing: Proceedings of the 
%International Conference on Information Processing, UNESCO, Paris, 1960}
%pp 256-264.  (RAND P-1584, and reprinted in {\it Computers and
%Automation}, July 1959).

{\bf Newell, A., J. C. Shaw and H. A. Simon, (1960)}:
``A variety
of intelligent learningin in a General Probelm Solver'', in 
{\it Self-Organizing Systems\/}, Yovits M.C., and Cameron, S. eds., 
Pergammon Press, Elmford, N.Y., pp 153-189.

{\bf Newell, A. and H. A. Simon (1972)}: {\it Human Problem Solving},
Prentice-Hall.

{\bf Reiter, Raymond (1980)}: ``A Logic for Default Reasoning'', {\it Artificial
Intelligence}, Volume 13, Numbers 1,2, April.

{\bf Simon, Herbert (circa 1960)}: still unsubstantiated rumor.
%GENERA[W86,JMC] TEXed on \jmcdate\ at \theTime.
\vfill\eject\end

	Here are some of the many ideas that have been proposed for making
AI systems more general.

	1. Logic as a source of generality in AI (1958) [McCarthy 1960]

	2. Formalized non-monotonic reasoning (1977) [McCarthy 1980],
[Reiter 1980], [McDermott and Doyle 1980]

	3. Formalization of concepts [McCarthy 1979]

	4. formalization contexts (pending)

	5. elaboration tolerance (pending)

	6. separation of epistemology from heuristics.[McCarthy and Hayes 1969]

	7. representation of behavior by program [Friedberg 1959]

	8. logic programming (1970s)

	9. production systems (1970s)

notes:


We need to be able to tell it metamathematics.

modality si, as a matter of generality

generality and lack thereof in Prolog

quote from 1958 paper about generality


	[McCarthy 1958] was in part a reaction to the lack of generality
of the Samuel [195x and 19xx] checker program that learned how to
improve its ability to play checkers.  Its behavior was determined
by the weights assigned to various features used in evaluating a checker
position.  These features included the numbers of men and kings on
each side, control of the center and back rows, other features considered
important by checker players and still others of no previously known
checker importance that Samuel included
to see if the learning program would assign them significant weights.
(It didn't).  The program adjusted the weights of the features of
positions in order to predict as well as possible the moves considered
good in a large database of master games.  The learning program was
quite successful, and one can conjecture that the ultimate weights
were close to optimal.

	However, even the best possible weights for the features considered
provided no way of taking into account certain checker stratagems.  For
example, a situation can sometimes be created in which a king belonging
to one side prevents two single men from the other side from advancing.
If this situation can be maintained while on the rest of the board
even exchanges and promotions to kings occur, eventually the side
whose king is holding the two singles will outnumber the other side
by one on the rest of the board.  This is a sufficient advantage
to win by a series of exchanges.  Lookahead cannot be deep enough to
see the win when the situation is first created.  There is probably
no setting of the parameters that will always give sufficient
importance to freeing the two blocked singles without leading to
wrong moves in other situations.  The problem is that the space
of values of the Samuel parameters provides an insufficiently general
representation of checker strategy.  Moreover, there is no way
of telling the Samuel program about the danger of letting a king
block two singles; the only way to get it in involves rewriting the
program.